Date: Tue, 10 Dec 1996 15:07:43 GMT
Server: NCSA/1.4.2
Content-type: text/html
Last-modified: Thu, 18 Jul 1996 05:43:23 GMT
Content-length: 11700

<html>
<head>
<title>
"Median Years to Ph.D." is not what you think!
</title>
</head>

<body bgcolor="#ffffff" text="#000070">

<h1>"Median Years to Ph.D." in new Conference Board
study of doctorate programs is not what you think!</h1>

<h3>
<!WA0><!WA0><a href="http://www.cs.washington.edu/homes/lazowska/lazowska.html">Ed
Lazowska</a>
<br>
<!WA1><!WA1><a href="http://www.cs.washington.edu/index.html">Department
of Computer Science &amp; Engineering</a>
<br>
<!WA2><!WA2><a href="http://www.washington.edu/index.html">University
of Washington</a>
</h3>

<h4>
September 1995
</h4>

<p>
<hr>
<p>

The just-released Conference Board study of research-doctorate
programs in the United States
includes a measure for each program labeled <b>"Median
Years to Degree,"</b> which is widely interpreted to be "the
median number of years that students spend in this graduate
program."  <b>Don't be fooled!</b>

<p>
In fact, what the study reports is best
described as <b>"the median number of years that elapse
from when the
student first enters any educational program in any field
at any institution after receiving his/her Bachelors degree,
until the student receives his/her Ph.D."</b>

<p>
Suppose, for example, that a student enters a Masters
program immediately after receiving his/her Bachelors
degree, and graduates from this Masters program in 2
years.  Then the student enters the workforce for 5
years.  Wanting to make the transition to research
(perhaps in an entirely different field than the
Masters degree!),
the student then enrolls in a Ph.D. program, from
which s/he graduates 4 years later.  The Ph.D.-granting
institution probably feels pretty
good -- cranked this student out in 4 years!  But in
the Conference Board study, this student will weigh
in at 2 + 5 + 4 = 11 years!

<p>
This semantic confusion is one issue:  we all attribute a
particular semantics to the "MYD" measure, which is not at
all what it actually represents.  A careful definition of
"MYD" would reduce confusion somewhat.  But many would
still be confused by the misleading title.
And even if the semantic confusion could be cleared up,
my opinion is that
the "MYD" measure does not convey information that
characterizes in a meaningful way the graduate program
to which it is attached:  it is not relevant to a
student trying to choose
between graduate programs, nor to an administrator
looking for bloated programs.
I might even argue that it does not represent something
worth tabulating and reporting at all, and that it's confusing
to do so.  (Looking more broadly than individual graduate
programs, I question whether "MYD" is even germane to a field as
a whole, since so many factors contribute to the measure.)

<h3>
Background
</h3>

In the Department of Computer Science &amp; Engineering
at the University of Washington, we routinely calculate
the median time that students spend in our doctoral
program.  This number has been stable at between 5 and 6
years for more than a decade.  (We do not require a Masters
degree en route to a Ph.D., so this number represents
"total time in graduate school" for a student who enters
directly from a bachelors program.  There are no tricky
semantics here -- it's exactly what you'd expect.)

<p>
We were, therefore, surprised when the new Conference Board
study reported 8.19 years as the "MYD" for our program.  Spurred
on by MIT, which noticed a similar phenomenon, we explored
further.

<p>
Our first step was to calculate the median time spent
in our program for graduates in the specific years
considered by the Conference Board study.  We did
this, using our own database, and confirmed a value
in the 5 to 6 year range.

<p>
Next, with
the help of our Graduate School, we obtained data
directly from the NRC Survey of Earned Doctorates -- the
actual data that had been used as input to the Conference Board
study.
(Graduating students fill out an SED form which is sent
to NRC.)
The SED form asks the student for a wide range of
data:  year of high school graduation, years of
attendance at every college (including 2-year) and
graduate institution where the student has spent time,
full time equivalent years as a student since receipt
of first Bachelors degree, etc.
While there were of course a few glitches among our 60-odd
graduates over the multi-year reporting interval, overall
the return rate was very high and the quality of the data
was very good.
We calculated a variety of measures from this data, and
formed a variety of hypotheses.
<p>
Finally, NRC staff provided essential assistance by re-working
their calculations for our program and reviewing them with us.
Without this assistance -- way beyond the call of duty -- we
would still be speculating.  (Data gathering and analysis
for the study
are the responsibility of NRC's Office of Scientific and
Engineering Personnel, which looks at human resource issues
across all science and engineering fields.)

<h3>
What "MYD" Really Means
</h3>

As noted in the preamble, <b>the "MYD" measure in the
Conference Board study</b>, while widely interpreted to be
"the median number of years that students spend in this
graduate program," is in fact <b>"the median number of
years that elapse from when the student first enters any
educational program in any field at any institution after
receiving his/her Bachelors degree, until the student receives
his/her Ph.D."</b>  In
disciplines or instances where significant employment
occurs between receipt of a Masters degree and entry
into a Ph.D. program, the difference can be huge.

<p>
This is not a measure that we've ever calculated for
our graduate program, nor is it a measure that we
consider particularly germane.  Surely, time spent
fully employed as part of a career plan, between
receiving a Masters degree from some other institution
and enrolling in our graduate program, is not
characteristic of our graduate program.  (Pushing a
bit harder, it's not even obvious that the time spent
in that Masters program elsewhere is germane, since we
don't require a Masters degree en route to the
Ph.D., and all students, regardless of background,
enter our program on an even footing in terms of
the "checkpoints" of the program.)

<p>
This is by far the greatest source of the discrepancy
between the "MYD" figure reported by the Conference
Board study and our own intuition about our graduate
program.
It's worth noting, though, that even when we use
the Conference Board's "MYD" definition and calculate
this measure from our own database, we obtain
somewhat different results than the Conference Board
study.  There are several secondary contributing factors
which may be of interest.

<p>
First, the Conference Board study calculates an integer
number of years for each student, by subtracting the
calendar year of entry from the calendar year of exit.
A student who enters in September of Year X and
graduates in January of Year X+5 actually spent 4.33
years in the program, but will be reported as 5 -- a
small but consistent effect, since most students first
enroll in the fall.

<p>
(It's worth noting, in this context, how the study
arrives at an "MYD" that is reported to two decimal
digits.  Those students who fall in the median year
are considered to have graduated uniformly across
that year, and based upon this, an offset within that
year is calculated to two digits and reported.)

<p>
Second, students occasionally mis-code themselves.  In
the case of our own program, four students coded themselves
as "Computer Engineering" rather than "Computer Science"
and were attributed to our Electrical Engineering department
in the study ... offset by four students we'd never heard of
who coded themselves "Computer Science!"

<p>
Third, students who omit essential fields from the
SED form must of course be omitted from the calculation.
This affected a non-negligible number of our graduates.

<p>
For simplicity, this explanation has been presented in the context
of the University of Washington Department of Computer
Science &amp; Engineering, but it applies to all programs
surveyed in the Conference Board study.


<h3>
Lessons
</h3>

<i>State definitions precisely.</i>  From the
Conference Board study document, one would be
unlikely to discern this definition of "MYD" and
its implications.

<p>
<i>Avoid using titles that will be assumed by many
to mean something other than what is really being reported.</i>  It
is better to choose a title with no obvious semantics than one with
the wrong obvious semantics.

<p>
<i>Be mindful of correct definitions when making
statements.</i>  Statements in the Conference Board
study document such as "It took graduates in the
1980s longer to earn a degree on average than
graduates of these programs took 10 years earlier"
would seem to contribute to misinterpretation.

<p>
<i>Consider the appropriateness of measures.</i>  Understanding
the definition of "MYD" will allow the community to
consider if this is the most appropriate measure.  The
SED form includes a wide variety of data; "MYD" is
the measure that the Conference Board study
has <i>chosen</i> to calculate and use.

<p>
<i>Don't confuse accuracy and precision.</i>  The Conference
Board study reports to two decimal digits a widely-misunderstood
measure with lots of fuzz in it.

<p>
<i>Handling survey instruments is difficult.</i>  Coding
errors are inevitable.  If the community wants
reliable analyses, we are going to
have to take the time to verify that we are providing
reliable data.

<h3>
Acknowledgements
</h3>

Jeff Dean, a graduate student in our department,
noticed the anomalous figure reported for us immediately
after the <!WA3><!WA3><a href="http://cra.org">Computing Research
Association</a> placed
the <!WA4><!WA4><a href="http://cra.org/cgi-bin/RankCS">Conference Board
study's Computer Science information on the Web</a>.
(Juan Osuna at CRA was responsible for this
effort, and also provided much assistance in tracking
things down.)  John Guttag of MIT contacted me after noting
the same anomaly for his program, and furnished considerable
guidance.

<p>
At the University of Washington, contributions
came from Frankye Jones (our staff graduate program
advisor), Carl Ebeling (our faculty graduate program
advisor), Dale Johnson (Dean of the Graduate School),
and John Drew (Manager of Computer Services at the
Graduate School).

<p>
At the National Academy of Sciences, Charlotte
Kuh and Jim Voytuk of the Office of Scientific
and Engineering Personnel (the organization
responsible for the Conference Board study)
expended a large amount of time and patience
helping us understand what was going on.
It's important to note the magnitude and
complexity of the Conference Board study:  41
fields, 274 universities, 3,634 research-doctorate
programs, 78,000 faculty members, and, by 1993,
nearly 40,000 Ph.D.s awarded per year.  Marjory
Blumenthal of
the Computer Science and Telecommunications
Board also provided guidance.

<p>
<hr>
<p>

Related material:

<ul>
<li><!WA5><!WA5><a href="http://cra.org/cgi-bin/RankCS">Computer Science
data from the Conference Board study</a>, provided by
the Computing Research Association
<p>
<li><!WA6><!WA6><a href="http://cra.org">Computing Research
Association</a> home page
<p>
<li><!WA7><!WA7><a href=http://www.cs.washington.edu/homes/lazowska/production.html>Massy-Goldman report alleging
50% CSE Ph.D. over-production to be re-issued due
to flawed data</a>
<p>
<li><!WA8><!WA8><a href="http://www.cs.washington.edu/homes/lazowska/cra">Computing Research:  Driving Information
Technology and the Information Industry Forward</a>
<p>
<li><!WA9><!WA9><a href="http://www.nas.edu/nap/bookstore/0309050944.html"><i>Research-Doctorate
Programs in the United States:  Continuity
and Change</i></a>, National Academy Press, 1995.

</ul>

</body>
<address>
<hr>
<!WA10><!WA10><a href="mailto:lazowska@cs.washington.edu">lazowska@cs.washington.edu</a>
</address>

</html>
